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SYSTEM FOR SYNCHRONIZATION can be t ^ iree implementation methods. One of these methods 

BETWEEN MOVING PICTURE AND A TEXT- includes a method of synchronizing moving picture with a 

TO-SPEECH CONVERTER synthesized speech on a sentence basis. This method regu- 

lates the time duration of the synthesized speech by using 
BACKGROUND OF THE INVENTION 5 information on the start point and end point of the sentence. 

This method has an advantage that it is easy to implement 
L Field of the Invention and the additional efforts can be minimized. However, the 

.c u smooth synchronization cannot be achieved with this 

The present invention relates to a system for synchroni- . . 

5\ . < . , urrTCN method. As an alternative, there is a method wherem infor- 

zation between moving picture and a text-to-speechCITS) 10 mation Qn ^ ^ ^ end ^ and honeme bol for 

converter, and more particulary to a system for synchroni- me ^ faaaa ^ d ^ the of ^ moving 

zation between moving picture and a text-to-speech con- c rdated tQ a speech signal to be used in generating a 

verter which can be realized a synchronization between synthe sized speech. Since the synchronization of moving 

moving picture and synthesized speech by using the moving picture with a synthesized speech ^ be achieved for each 

time of Up and duration of speech information. 15 p noneme w j m this method, the accuracy can be enhanced 

2. Description of the Related Art However, this method has a disadvantage that additional 

efforts should be exerted to detect and record time duration 

In general, a speech synthesizer provides a user with information for every phoneme in a speech interval of the 

various types of information in an audible form. For this moving picture, 

purpose, the speech synthesizer should provide a high qual- 20 

: r , *u • * * * • * As another alternative, there is a method wherein syn- 

lty speech synthesis service from the input texts given to a *- a a u a ** u 

J ■ % , , . . 1 it . t . chronrzation information is recorded based on patterns hav- 

user. In addition in order for me speech synthesizer to be ^ characteristic b which a u motion ^ be easily 

operatively coupled to a database constructed in a multi- &tinguishe4 sucn as ^ start and end po mts of me spe ech, 

media environment or various media provided by a coun- ^ the Qpening and dosing of ±t ^ protrusion of the Hp, etc. 

terpart involved in a conversation, the speech synthesizer TMs metnod can enhance the efficiency of synchronization 

can generate a synthesized speech so as to be synchronized while njinimizing the additional efforts exerted to make 

with these media. In particular, the synchronization between information for synchronization. 

moving picture and the TTS is essentially required to 

provide a user with a high quality service. 30 SUMMARY OF THE INVENTION 

FIG. 1 shows a block diagram of a conventional text-to- 

speech converter which generally consists of three steps in It is therefore an object of the present invention to provide 

. . r , ~ it _ . a method of formatting and normalizing continuous lip 

generating a synthesized speech from the mput text iwuui B B F 

fe to ^ r motions to events in a moving picture besides a text in a 

At step 1, a language processing unit 1 converts an input 35 text-to-speech converter. 



text to a phoneme string, estimates prosodic information. 



It is another object of the invention to provide a system for 



and symbolizes it. The symbol of the prosodic information synchronization between moving picture md a synthe sized 

is estimated from the phrase boundary, clause boundary. h defi m between ^ Momultion 

accent position, sentence patterns, etc. by analyzing a syn- and ^ ^ and usin ft ^ atin the synthesized 

tactic structure. At step 2. a prosody processing unit 2 40 h 
calculates the values for prosody control parameters from 

the symbolized prosodic information by using rules and In accordance with one aspect of the present invention, a 
tables. The prosody control parameters include phoneme system for synchronization between moving picture and a 
duration and pause interval information. Finally, a signal text-to-speech converter is provided which comprises dis- 
processing unit 3 generates a synthesized speech by using a 45 tributing means for multi-media input information, trans- 
synthesis unit DB 4 and the prosody control parameters. forming it into the respective data structures, and distribut- 
That is. the conventional synthesizer should estimate pro- ing it to each medium; image output means for receiving 
sodic information related to naturalness and speaking rate image information of the multi-media information from said 
only from an input text in the language processing unit 1 and distributing means; language processing means for receiving 
the prosody processing unit 2. so language texts of the multi-media information from said 
_ . „, ™ 0 . , distributing means, transforming the text into phoneme 
Presently, a lot of researches on the TTS have been estimating and symbolizing prosodic information; 
conducted through the world for application to mother * p rocess ing means for receiving the processing 
languages, and some countries have already started a com- ^ frQm said processing mea ns. calculating the 
mercial service. However, the conventional synthesizer is 5J values rf ^ control ^ ametets; synchronization 
aimed at its use in synthesizing a speech from an mput text. adjusting means for rece iving the processing results from 
and thus there is no research activity on a synthesizing ^ processing mea „s. adjusting time durations for 
method which can be used in connection with multi-media. mg fa chronization with image signals by 
In addition, when dubbing is performed on moving picture synchronization information of the multi-media infor- 
or animation by usmg the conventional TTS method L infer- ^ from ^ means md imertia ^ 
mation required to implement the synchromzatwn of media ^ ^ duratiolls Mo the results of said d 
with a synthesized speech cannot be estimated from the text sin means; si ^ processing mea ns for receiving the 
only. Thus, it is not possible to generate a synthesized sin results from sa id synchronization adjusting 
speech, which is smoothly and operatively coupled to mov- means tQ generate a synthesized speech; ^ a synthesis unit 
mg pictures, from only text mformaUon. g5 daUbasc block for selecting required unit for synthesis in 
If the synchronization between moving picture and a accordance with a request from said signal processing 
synthesized speech is assumed to be a kind of dubbing, there means, and transferring the required data. 



5,970,459 

3 

BRIEF DESCRIPTION OF THE DRAWINGS 



TABLE 1 



The present invention will become more apparent upon a 

detailed description of the preferred embodiments for car- Example of Synchronization information 

rying out the invention as rendered below. In the description 5 inp Ut 

to follow, references Will be made to the accompanying Information Parameter Parameter Value 



text sentence 



drawings, where like reference numerals are used to identify 

like or similar elements in the various drawings and in moving picture scene 

Which: synchronization lip shape degree of a down motion of 

10 information a lower lip, up and down 

KEG. 1 shows a block diagram of a conventional text-to- motion at the left edge of 

speech converter; STatt^fe^f 

FIG. 2 shows a block diagram of a synchronization StKgtT 
system in accordance with the present invention; 15 a lower lip, up and down 

motion at the right edge of 

FIG. 3 shows a detailed block diagram to illustrate a a lower Up, up and down 

method of synchronizing a text-to-speech converter; and motkm at me center 

portion of an upper lip, up 

FIG. 4 shows a flow chart to illustrate a method of and down motion at the 

. ■ . , on center portion of a lower 

jj, synchronizing a text-to-speech converter. 20 Up, degree of p^trusiEg 

■px- of an upper lip, degree of 

g DETAILED DESCRIPTION OF THE StlS'of 

: I INVENTION a Up to the right edge of 

25 a Kp> and distance from 

§3 FIG. 2 shows a block diagram of a synchronization kft^Tofa*^ *° 

M= system in accordance with the present invention. In FIG. 2, information position of scene in moving 

ftp reference numerals 5, 6, 7, 8 and 9 indicate a multi-data on P 05 ^ 011 P icture 

y input unit a central processing unit a synthesized database, ^ tion ° f COQtinuous 

~~ a digital/analog(D/A) converter, and an image output unit, 30 

Ms respectively ^ shows a detailed block diagram to illustrate a 

m Data comprising multi-media such as an image, text, etc. method of synchronizing a text-to-speech converter and 

f: is inputted to the multi-data input unit 5 which outputs the mG ' 4 shows a flow chart t0 Estate a method of synchro- 

n input data to the central processing unit 6. Into the central 35 t a text-to-speech converter. In FIG. 3, reference 

£ processing unit 6, the algorithm in accordance with the JL^^ J ffl £ ? * ^ 16 .f nd 17 a 

f~- « ^ • 4j • i f. , ™ _ multi-media information input unit, a multi-media 

* present mventron is embedded. The synthesized database 7, distributon a standardized lan P guage 

p. a synthesized DB for use in the synthesis algorithm is stored prosody processing unit a synchronization adjusting unit a 

in a storage device and transmits necessary data to the signal processing unit, a synthesis unit database, and an 

central processing unit 6. The digital/analog converter 8 40 image output unit, respectively, 

converts the synthesized digital data into an analog signal to The multi-media information in the multi-media informa- 

output it to the exterior. The image output unit 9 displays the tion input unit 10 is structured in a format as shown above 

input image information on the screen. m table 1. and comprises a text moving picture, lip shape, 

_ U1 1 u . 1 information on positions in the moving picture, and infor- 

lable 1 as shown below illustrates one example of struc- 45 mation on time durations. The multi-media distributor 11 

tared multi-media input information to be used in connec- receives the multi-media information from the multi-media 

tion with the present invention. The structured information information input unit 10, and transfers images and texts of 

includes a text, moving picture, lip shape, information on th e multi-media information to the image output unit 17 and 

positions in the moving picture, and information on the time the language processing unit 12, respectively. When the 

duration. The lip shape can be transformed into numerical 50 f y nchr °nization information is transferred, it is converted 

values based on a degree of a down motion of a lower Hp, i ? to a ^ , structure which can be used in the synchroniza- 

up and down motion at the left edge of an upper lip, up and tion ad J usti ng unit 14 

down motion at the right edge of an upper lip, up and down ^ lan S ua S e processing unit 12 converts the texts 

motion at the left edge of a lower Hp, up and down motion received from multi-media distributor 11 into a phoneme 

at the right edge of a lower Hp, up and down motion at the 55 Stnn f * and es 5 mates and symboHze prosodic information to 

center portion of an upper Hp, up and down motion at the ^ s ^^ to the prosody processing unit 13 The symbols for 

center portion of a lower Hp, degree of protrusion of an f c P ™ $ f mfo ^ m ff estunated from * e I*™* 

A am . a ~~* ■ * i i- " 7 boundary, clause boundary, the accent position, and sentence 

£^^?r^° f % lo ^ er ^^ ce ^ pattern, etc. by using the results of analysis of syntax 

the center of a hp to the right edge of a Hp, and distance from ^ structures 

TJZVL^^ lCf l'fT ° f / HP * ^ ?f ShaPC ™ ™ e P rosod y P«««iog unit 13 receives the processing 
also be defined m a quantified and normalized pattern in results from ^ i anguage processing unit 12, and calculate! 
accordance with the position and manner of articulation for the values of the prosodic control parameters. The prosodic 
each phoneme. The information on positions is defined by contr ol parameter includes the time duration of phonemes 
the position of a scene in a moving picture, and the time 6 5 contour of pitch, contour of energy, position of pause, and 
duration is defined by the number of the scenes in which the length. The calculated results are transferred to the synchro- 
same Hp shape is maintained. nization adjusting unit 15. 



